Article 5115

Title of the article

THE EFFECT OF REDUCING THE SIZE OF A TEST SAMPLE BY SWITCHING TO
MULTIDIMENSIONAL STATISTICAL ANALYSIS OF BIOMETRIC DATA

Authors

Volchikhin Vladimir Ivanovich, Doctor of engineering sciences, professor, President of Penza State University (40 Krasnaya street, Penza, Russia), cnit@pnzgu.ru
Ivanov Aleksandr Ivanovich, Doctor of engineering sciences, associate professor, head of laboratory of biometric and neural-network technologies, Penza Research Electrotechnical Institute (9 Sovetskaya street, Penza, Russia), ivan@pniei.penza.ru
Serikova Natal'ya Igorevna, Engineer-programmer, research and production enterprise “Rubin” (2 Baydukova street, Penza, Russia), s.kachalin@gmail.com
Funtikova Yuliya Vyacheslavovna, Engineer-programmer, Penza Research Electrotechnical Institute (9 Sovetskaya street, Penza, Russia), pniei@penza.ru

Index UDK

519.7; 519.66; 57.087.1, 612.087.1

Abstract

Background. At the present time most industrial methods of data quality assessment are based on using a classic criterian of chi-square that shows a good performance with large test samples. At quality assessment of training and test samples of biometric data there is no possibility to use large test samples, consisting of 200 experimentally obtained values. For training and testing of artificial neural networks one usually uses the samples, consisting of 20 examples. In this connection there arises a topical problem of reduction (decimation) of sample sizes of experimentally obtained data while saving the level of statistical analysis’ results reliability.
Materials and methods. The work analyses the influence of a biometric data quantization error, occurring due to approximation of the function of experimental data values distribution density by the histogram thereof. It is shown that the synthesis of value distribution density histograms significantly aggrevates the data quantization errors, caused by a small number of examples in a sample.
Results. The authors suggested to refuse to use histograms in favor of approximation of the function of observed events occurrence probability. It is equivalent to the transition from a chi-square statistical criterion to the Gini statistical criterion. Therewith, having small samples the quantization error dicreases 5 times with the usage of a one-dimensional Gini criterion. Even greater reduction of the effect of quantization errors may be achieved using a multidimentional generalized Gini criterion. It is proved that the effect of quantization errors is proportional to the dimension root, used in a Ginin criterion.
Conclusions. At the transition from a one-dimensional chi-square criterion of statistical hypotheses checking to the usage of the multidimensional Gini criterion it is possible to significantly lower the requirements to the sizes of training and test samples of biometric data. There occurs an opportunity to increase the quality of training and testing of artificial neural networks of biometrics-code converters due to multidimensional statistical control of training test samples.

Key words

biometric data, statistical data processing, Gini criterion, chi-square criterion.

Download PDF
References

1. GOST R 52633.5–2011. Zashchita informatsii. Tekhnika zashchity informatsii. Avtomaticheskoe obuchenie neyrosetevykh preobrazovateley biometriya–kod dostupa [Data protection. Data protection technique. Automatic training of neural-network converters of biometrics-code access]. Moscow, 2011.
2. GOST R 52633.3–2011. Zashchita informatsii. Tekhnika zashchity informatsii. Te-stirovanie stoykosti sredstv vysokonadezhnoy biometricheskoy zashchity k atakam podbora [Data protection. Data protection technique. Resistance testing of highly-reliable biometric matching protection]. Moscow, 2011.
3. Available at: http://pniei.rf/activity/science/noc.htm.
4. R 50.1.037–2002. Prikladnaya statistika. Pravila proverki soglasiya opytnogo raspredeleniya s teoreticheskim. Chast' II. Neparametricheskie kriterii [Applied statistics. Rules of checking concord between experimental and theoretical distribution. Part II. Non-parametric tests]. Moscow: Gosstandart Rossii, 2002.
5. Kobzar' A. I. Prikladnaya matematicheskaya statistika. Dlya inzhenerov i nauchnykh rabotnikov [Applied mathematical statistics. For engineers and scientific staff]. Moscow: FIZMATLIT, 2006, 816 p.
6. Akhmetov B. S., Volchikhin V. I., Ivanov A. I., Malygin A. Yu. Algoritmy testirovaniya biometriko-neyrosetevykh mekhanizmov zashchity informatsii Kazakhstan [Testing algorithms for biometric neural-network mechanisms of data protection of Kazakhstan]. Almaty: KazNTU im. Satpaeva, 2013, 152 p.
7. Akhmetov B. S., Nadeev D. N., Funtikov V. A., Ivanov A. I., Malygin A. Yu. Otsenka riskov vysokonadezhnoy biometrii: monogr. [Assessment of highly-reliable biometrics’ risks: monograph]. Almaty: Iz-vo KazNTU im. K. I. Satpaeva, 2014, 108 p.
8. Nadeev D. N. Neyrokomp'yutery: razrabotka, primenenie [Neurocomputers: development, application]. 2009, no. 6, pp. 53–55.
9. Funtikova Yu. V., Ivanov A. I., Zakharov O. S. Trudy nauchno-tekhnicheskoy konferentsii klastera penzenskikh predpriyatiy, obespechivayushchikh bezopasnost' informatsionnykh tekhnologiy [Proceedings of scientific and engineering conference of the cluster of Penza enterprises providing information technologies protection]. Penza, 2014, vol. 9, pp. 7–8. Available at: http://www.pniei.penza.ru/RV-conf/T9/S7.
10. Eykkhoff P. Osnovy identifikatsii sistem upravleniya [Basic control systems identification]. Moscow:Mir,1975,680 p.
11. Boll Rud, Konnel Dzhonatan Kh., Pankanti Sharat, Ratkha Nalini K., Sen'or Endryu U. Rukovodstvo po biometrii: per. s angl. [Biometrics guide: translation from English]. Moscow: Tekhnosfera, 2007, 368 p.

 

Дата создания: 30.06.2015 13:20
Дата обновления: 03.07.2015 14:56